Molecular models of multiple sclerosis severity identify heterogeneity of pathogenic mechanisms

While autopsy studies identify many abnormalities in the central nervous system (CNS) of subjects dying with neurological diseases, without their quantification in living subjects across the lifespan, pathogenic processes cannot be differentiated from epiphenomena. Using machine learning (ML), we searched for likely pathogenic mechanisms of multiple sclerosis (MS). We aggregated cerebrospinal fluid (CSF) biomarkers from 1305 proteins, measured blindly in the training dataset of untreated MS patients (N = 129), into models that predict past and future speed of disability accumulation across all MS phenotypes. Healthy volunteers (N = 24) data differentiated natural aging and sex effects from MS-related mechanisms. Resulting models, validated (Rho 0.40-0.51, p < 0.0001) in an independent longitudinal cohort (N = 98), uncovered intra-individual molecular heterogeneity. While candidate pathogenic processes must be validated in successful clinical trials, measuring them in living people will enable screening drugs for desired pharmacodynamic effects. This will facilitate drug development making, it hopefully more efficient and successful.

-Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy

Human research participants
Policy information about studies involving human research participants and Sex and Gender in Research.
Reporting on sex and gender Population characteristics

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.
Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative. Reporting for specific materials, systems and methods All relevant raw data supporting key findings of this study are available within this article and its Supplementary Information.
We report sex in our study and we performed sex-adjustment of somamer levels, reflecting the biology associated with sexual dimorphism.
Two population of subjects were were prospectively recruited: 1. Subjects with a definitive diagnosis of Multiple Sclerosis that were not in exacerbation or on low-efficacy disease modifying therapy within 3 months of lumbar puncture, or on high-efficacy disease modifying therapy within 6 months of lumbar puncture. The average age of this population was 49.7 (SD +/-11.7) years, represented by 54% to 46% females to males distribution.
2. Healthy volunteers that lack neurological neurological diagnosis or systemic disease that could influence neurological disability or brain MRI, and with vital signs in the normal range during the initial screening. The average age of this population was 41.6 (SD 11.7) years, with 52% to 48% females to males distribution.
Subjects were recruited prospectively as part of the Natural History protocol "Comprehensive Multimodal Analysis of Neuroimmunological Diseases of the Central Nervous System" (Clinicaltrials.gov identifier NCT00794352) The study was reviewed and approved by the Intramural Institutional Review Board at the National Institutes of Health.
No power calculation was performed, all available samples were randomly split into training and validation cohort 2:1, upon revision, we have added additional samples into the validation cohort, observing minimal effect on the model performance in the validation cohort (achieving low p-value in the original, as well as the extended validation cohort).

No data were excluded
Single set of clinical, volumetric MRI, and CSF biomarker values were collected for each patient/visit. No replication was attempted for these outcomes. However, the final CSF biomarker-based models generated in the training dataset were validated in the independent validation cohort that did not contribute in any way to the development of the models. Moreover, addition of more samples into the validation cohort didn't significantly effect the model performance.
The blinded data were randomly split into training and validation cohort (2:1) before any analyses were performed.
Clinical, imaging and demographic data were collected prospectively before the proteomic Somascan assay was performed on CSF samples. The data were QC-ed and locked in the research database. Somascan assay was performed on coded samples by personnel blinded to any metadata associated with the samples.

March 2021
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. We performed one set of structural MRI scans (T1-weighted, T2-weighted images ) at each clinic visit for each subject.
NA NA structural 1.5T and 3T T1 magnetization-prepared rapid gradient-echo (MPRAGE) or fast spoiled gradient-echo (FSPGR) and T2 weighted three-dimensional fluid attenuation inversion recovery (3D FLAIR). whole brain Raw unprocessed but locally anonymized and encrypted T1 -MPRAGE or T1 -FSPGR and T2 -3D FLAIR DICOM files as input sequences, ideally with 1 mm3 isotropic resolution, were uploaded to the QMENTA platform. LesionTOADS, now implemented into the cloud-based service, is a fully automated segmentation algorithm using multichannel MRI data. The uploaded sequences are anterior commissure-posterior commissure (ACPC) aligned, rigidly registered to each other and skull stripped (the T1 image is additionally bias-field corrected). The segmentation is performed by using an atlas-based technique combining a topological and statistical atlas resulting in computed volumes for each segmented tissue in mm3.